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ABSTRACT 


& 


The  horizontal  distance  y^(x)  y'Q  *  (F  (x) )  -  x  has  been  shown  by  Doksum 
(1974)  to  be  a  useful  measure  of  the  difference,  at  each  x,  between  the 
populations  defined  by  continuous  distribution  functions  F(x)  and  G(x). 
Here  we  as  sine  that  G  is  known,  and  we  develop  a  Bayesian  nonparametric 
estimator  ^(x)  of  A(x)  based  on  a  random  sample  of  n  X's  from  F.  The 
estimator  Xn  is,  for  weighted  squared- error  loss,  Bayes  with  respect  to 
Ferguson's  (1973)  Dirichlet  process  prior.  Using  a  result  of  Korwar  and 
Hollander  (1976),  the  Bayes  risk  of  2^  is  evaluated  for  the  case  when  G  is 
uniform. 
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1.  Introduction. 

When  F  and  G  are  continuous  distribution  functions,  the  horizontal 
distance 

A(x)  -  G_1(F(x))  -  x,  x  real,  (1) 

has  been  shown  by  Doksum  (1974)  to  be  a  useful  measure  of  the  difference, 
at  each  x,  between  F  and  G.  Under  suitable  regularity,  Doksum  shows  that 
A(x)  is  essentially  the  only  function  satisfying 

X  +  A(X)  -  Y,  (2) 

where,  in  (2),  X  is  distributed  according  to  F,  Y  is  distributed  according  to 
G,  and  means  "has  the  same  distribution  as." 

When  the  linear  model 

F(x)  -  G(x  +  A),  for  all  x,  (3) 

holds,  where  A  is  a  constant,  then  A(x)  =  A  (and,  of  course,  when  F  =  G, 

A(x)  s  0.) 

When  one  observes  a  random  sample  of  n  X's  from  F  and  an  independent 
random  sample  of  m  Y's  from  G,  Doksum  suggests  estimating  A(x)  by 

^(x)  -  G‘^(Fn(x))  -  x,  (4) 

where  N  ■  m  ♦  n  and  Fn,  Gm  are  the  empirical  distribution  functions  based  on 
the  X's  and  Y's,  respectively.  Doksum  also  derives  a  simultaneous  confidence 
band  for  A(x)  and  shows  that  N^tA^x)  -  A(x)}  converges  weakly  to  a  Gaussian 


process. 
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In  this  paper  we  consider  the  one-sample  problem  where  G  ie  known  and 
(just)  a  random  sample  of  n  X's  from  F  is  available  for  estimating  A(x). 

One  natural  estimator  for  this  problem  is  the  one- sample  limit  (m  -*■  •)  of 
Doksum's  estimator  a^.  This  one- sample  limit  is 

yx)  -  G'1(Fn(x))  -  x.  (5) 

The  estimator  An  does  not  utilize  prior  information  about  the  unknown  F.  Our 
approach  is  Bayesian  and  leads  to  an  estimator  2>n  which  does  use  prior  information 
about  F. 

We  assume  that  F  is  a  random  distribution  function  chosen  according  to 
Ferguson's  (1973)  Dirichlet  process  prior  (Definition  2.2)  with  parameter  «(•)» 
a  completely  specified  measure  on  the  real  line  R  with  the  Borel  a -field  B. 

A  defect  to  this  approach  is  that  the  randomly  chosen  F  will  not  be  continuous 
(Ferguson's  Dirichlet  process  prior  chooses,  with  probability  one,  a  discrete 
distribution)  and  thus  the  desirability  of  estimating  A(x)  is  slightly 
diminished.  Nevertheless,  in  this  case  A(x)  remains  a  useful  measure  of  the 
distance  between  F  and  G  at  x,  and  the  resulting  estimator  ^n00  combines 
sample  information  and  prior  infoimation  in  an  effective  manner. 

Our  loss  function  is 

L(A,  A)  *  J(A(x)  -  A(x))2dW(x),  (6) 

where  A  is  an  estimator  of  a  and  W  is  a  finite  measure  on  (J?,  B) .  A  general 
expression  for  the  Bayes  estimator  2^  is  given  in  Section  3,  and  explicit 
expressions  for  ar  are  obtained  for  the  cases  when  G  is  (i)  exponential  and 
(ii)  uniform.  Furthermore,  in  thd> uniform  case  we  derive  the  Eayes  risk  of 
Section  2  contains  preliminaries  relating  to  the  Dirichlet  process. 
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2.  Dirichlet  Process  Preliminaries. 

This  section  briefly  gives  some  definitions  and  theorems  associated  with 
the  Dirichlet  process.  For  further  details  the  reader  is  refered  to 
Ferguson  (1973). 


DEFINITION  2.1  (Ferguson).  Let  Z^,  ...,  be  independent  random 

variables  with  Z.  having  a  gamna  distribution  with  shape  parameter  a.  i  0  and 
j  J 

scale  parameter  1,  j  =  1,  ...,  k.  Let  oj  >  0  for  some  j.  The  Diriohlet 

distribution  with  parameter  (a^,  ...,  ak),  denoted  by  D(.a^,  ...,  a^),  is 

defined  as  the  distribution  of  (Y.,  ...,  Y,),  where  Y.  =  Z./  T  Z.,  j  ■  1,  ...» 
k  1  K  3  3  i-1  1 

Since  T  Y.  *  1,  the  Dirichlet  distribution  is  singular  with  respect 
i-1  1 

to  Lebesgue  measure  in  k-un-cnsional space.  If  *  0,  the  corresponding  Yj 
is  degenerate  at  zero.  If  however  o^.  >  0  for  all  j,  the  (k  -  1) -dimensional 
distribution  of  (Y^,  ...,  Yk  ^)  is  absolutely  continuous  with  density 


f(yx»  •••>  yk.iloi»  •••»  °k) 


(7) 


r(oj  + 

■ 


.  +  ou) 

W 


k-1  o.-l  k-1 

(  n  y  3  )(l  -  l 
j=l  3  j-1 


V1 

yj)  i3(y 


l* 


....  yk.!) 


k-1 

where  S  is  the  simplex  S  ■  {(yv,  ...»  yv  ,):  y.  i  0,  l  y.  s  l}. 

*  K'A  J  j»l  J 

DEFINITION  2.2  (Ferguson).  Let  (Y,  4)  be  a  measurable  space.  Let  a  be  a 
non-null  finite  measure  (nonnegative  and  finitely  additive)  on  (Y,  4) .  We  say 
P  is  a  Dirichlet  prooeee  on  (Y,  4)  with  parameter  a  if  for  every  k  ■  1,  2,  ...» 
and  measurable  partition  (B^,  . . .,  B^)  of  Y,  the  distribution  of 
(P(B^),  ...»  P^))  is  Dirichlet  with  parameter  (a(B^),  ...,  o(Bk)). 


DEFINITION  2.3  (Ferguson).  The  ^-valued  random  variables  X^  ...» 
constitute  a  sample  of  size  n  from  a  Diriohlet  process  P  on  (X,  A )  with 


'n 


parameter  o  if  for  any  m  ■  1,  2,  ...  and  measurable  sets  Ap 
QC^  e  Cr  ...,  Xn  e  Cn|P(A1),  ....  PCy,  PCCj),  ....  P(Cn)} 
where  Q  denotes  probability. 


ci . c» 

n  P(C.)  a.s., 
i«l  1 


THE0RB1  2.4  (Ferguson).  Let  P  be  a  Diriohlet  process  on  (X,  A)  with 

parameter  a,  and  let  X j,  Xn  be  a  sample  of  size  n  from  P.  Then  the 

conditional  distribution  of  P  given  X . . .  3  X  is  a  Diriohlet  process  on 

n  1  ” 

(X,  A)  with  parameter  3  =  a  +  \  ,  where }  for  x  e  X,  A  e  Aj  6  (A)  *  1  if 

i=l  Ai  * 

x  e  A,  0  otherwise. 
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3.  A  Bayes  Estimator  of  the  Horizontal  Distance. 

We  suppose  that  F  is  chosen  according  to  a  Dirichlet  process  prior  on 
[R,  B)  with  parameter  a.  With  the  loss  function  given  by  (6),  the  Bayes 
estimator  for  the  no-sample  problem  is  found  by  minimizing  the  right-hand-side 

of  (8), 

EL$,  A)  =  jE$(x)  -  A(x))2dW(x),  (8) 

where  the  expectation  is  with  respect  to  F.  The  estimator  is  obtained  by 
minimizing  E$(x)  -  A(x))2  for  each  x,  yielding 

£(x)  =  E(A(x))  =  E{G_1F(x)}  -  x.  (9) 

We  next  evaluate  (9)  in  the  cases  where  (i)  G  is  exponential  and  (ii)  G  is 
uniform. 


3.1.  The  Case  Where  G  is  Exponential:  Let  G(x)  =1  -  exp(-Ax),  x  >  0, 
and  0  for  x  s  0,  for  some  A  >  0.  Then 

G‘*(x)  =  -A*  tn(l  -  x),  0  <  x  <  1, 

and  (9)  reduces  to 

£(x)  *  {B(<*',  B'))'1  JC-A'^nCl  -  y)3y0,,'1(l  -  y)B'_1dy  -  x.,  (10) 

0 

wnere  B(a ' ,  B')  *  r(o')r(B')/r(a'  +  B')»  Equation  (10)  makes  use  of  the  fact 
that  for  each  x,  F(x)  is  distributed  according  to  the  Beta  distribution  with 
parameters  o'  ■  <*((-«>,  x]),  B'  =  a(l?)  -  a’.  (To  see  this  use  Definition  2.2 
with  the  measurable  partition  B^  *  (-»,  xl,  %2  “  R  *  Thus,  for  the 
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"no- sample"  problem,  by  expanding  fcn(l  -  y)  in  a  power  series,  we  obtain 

A(x)  =  {xb(o ',  e')}'1  J  l  rV'+’-hl  -  y)3'*1#  -  X 

o  j=l 

-  A*1  l  [B(o *  +  j,  $ ')/( B')>]  -  x. 

3=1 

Using  Theorem  2.4,  the  Bayes  estimator  when  a  sample  X^,  ...,  Xn  is 
available  from  F,  is 


’n(x)  =  A"1  l  CB(a”  +  j,  B")/{jB(a",  0")}j  -  X, 

3=1 


(ID 


where 


a* '  =  o((-»,  x 


n 


J)  +  I  ^  ((— ,  xj), 

i=l  xi 


0  '  *  =  a  (i?)  +  n  -  a ' ' . 

3.2.  The  Case  Where  G  is  Uniform:  Let  G(x)  =  0  for  x  <  a,  (x  -  a)/ (b  -  a) 
for  a  s  x  <.  b,  and  1  for  x  >  b,  for  some  a  <  b.  Then  (9)  reduces  to 

A (x)  =  Jiy(b  -  a)  +  a.3CB(«»,  -  x 

0 

=  a  +  (b  -  a){B(o'  +  1,  e’)/B(ce',  e *)>  -  x 


■  a  +  (b  -  a){o,/(o'  +  6*)}  -  x 


■  a  +  (b  -  a)FQ(x)  -  x, 

where 

Fq(x)  -  a((-«,  x])/a(fl),  x  e  R, 
can  be  interpreted  as  the  "prior  guess"  at  F. 


Thus,  from  Theorem  2.4,  when  a  sample  X^,  ...,  is  available  from  F, 
the  Bayes  estimator  is 

=  a  +  0  -  a)£n(x)  -  x,  x  e  R,  (12) 

where 

n 

F  (x)  =  {a((-<*>,  x])  +  £  &  ((-»,  x])}/(a (#)  +  n). 

n  i=l  H 

/\j 

The  minimum  Bayes  risk  S(a)  of  An(12)  can  be  computed  vising  results  of 
Kbrwar  and  Hollander  (1976).  Korwar  and  Hollander  obtained  the  minimum  Bayes 
risk  R(a)  of  the  estimator  against  weighted  squared  error  loss  to  be 

R(eO  =  C«(R)/((a(fl)  ♦  l)(o(if)  +  n)}]/F0(x)(l  -  Fjfx) )di»'(x) . 

(See  equation  (2.19)  of  Hollander  and  Korwar  (1976)  and  replace  the  m  of 
that  equation  with  n  here.)  It  immediately  follows  that  S(a)  «  (b  -  a)^R(a) . 

We  note  that  we  can  also  directly  obtain  the  risk  T(a)  (say)  of  ths  or/,  -sx: 
limit  of  Doksum's  estimator  (see  equation  (5))  with  respect  to  the 
Dirichlet  process  prior  with  parameter  a  in  this  case  when  G  is  uniform. 

We  find 


An(x)  *  a  +  (b  -  a)Fn(x)  -  X,  x  e  R, 

where  Fr(x)  is  the  empirical  distribution  function  of  the  X's.  Using  (3.3) 
of  Kbrwar  and  Hollander  (1976)  we  obtain  T(a)  =  (b  -  a)2(l  +  a(7?)/n)R(a) . 
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